-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: fix DataFrame.__getitem__ and .loc with non-list listlikes #21313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this would have to be for 0.24 (not convinced we should do this) |
Do you think we should disable listlikes in other cases? Or you don't find the discrepancy problematic? |
Hello @toobaz! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 07, 2018 at 08:11 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #21313 +/- ##
==========================================
- Coverage 91.95% 91.95% -0.01%
==========================================
Files 160 160
Lines 49820 49818 -2
==========================================
- Hits 45812 45809 -3
- Misses 4008 4009 +1
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.23.1.txt
Outdated
@@ -110,6 +110,8 @@ Bug Fixes | |||
- Bug in :meth:`Series.reset_index` where appropriate error was not raised with an invalid level name (:issue:`20925`) | |||
- Bug in :func:`interval_range` when ``start``/``periods`` or ``end``/``periods`` are specified with float ``start`` or ``end`` (:issue:`21161`) | |||
- Bug in :meth:`MultiIndex.set_names` where error raised for a ``MultiIndex`` with ``nlevels == 1`` (:issue:`21149`) | |||
- Bug in :meth:`DataFrame.__getitem__` and :meth:`DataFrame.loc` which did not accept columns keys passed as non-list iterables (:issue:`21294`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be for 0.24
pandas/core/frame.py
Outdated
if self.columns.nlevels > 1: | ||
return self._getitem_multilevel(key) | ||
return self._get_item_cache(key) | ||
except (ValueError, TypeError): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what hits this exception here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of cases, for instance pd.DataFrame(index=range(3), columns=range(3))[['a', 'b']]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so a list-like key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I guess so
pandas/core/frame.py
Outdated
return self._get_item_cache(key) | ||
# We are left with two options: a single key, and a collection of keys, | ||
# We interpret tuples as collections only for non-MultiIndex | ||
coll_key = is_list_like(key) and (not isinstance(key, tuple) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you take the inverse and name this is_single_key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a single tuple is a single key, yes? (MI or no)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a single tuple is a single key, yes? (MI or no)?
Right, I was assuming tuples as collections were allowed, luckily they weren't
pandas/core/frame.py
Outdated
|
||
def _getitem_array(self, key): | ||
if not coll_key: | ||
# This test preserves #9519; the second part preserves #21309 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give a more informative comment
pandas/core/frame.py
Outdated
elif len(key) != len(self.index): | ||
raise ValueError('Item wrong length %d instead of %d.' % | ||
(len(key), len(self.index))) | ||
# check_bool_indexer will throw exception if Series key cannot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line here
@@ -501,9 +501,11 @@ def test_constructor_dict_of_tuples(self): | |||
tm.assert_frame_equal(result, expected, check_dtype=False) | |||
|
|||
def test_constructor_dict_multiindex(self): | |||
check = lambda result, expected: tm.assert_frame_equal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not here, but I can avoid fixing the lambda
indexer = np.arange(len(df.columns))[isna(df.columns)] | ||
|
||
if len(indexer) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on each of these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
tm.assert_series_equal(df.iloc[:, indexer[0]], | ||
df.loc[:, np.nan]) | ||
|
||
# multiple nans should fail | ||
# multiple nans should result in DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really!
@jreback ready for me |
I have my doubts we should do this, commented on the relevant issue: #21294 |
Discussion (I think) concluded, conflicts fixed by rebasing... ready for me. Objections @jreback ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz I had some comments that I didn't click on. and can you rebase on master.
if self.columns.is_unique and key in self.columns: | ||
if self.columns.nlevels > 1: | ||
return self._getitem_multilevel(key) | ||
return self._get_item_cache(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you changing to directly use
_get_item_cache here rather than _getitem_column? (is it removed)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removed. It did a uniqueness test which is no more necessary, and was misleading anyway, as it could not really manage all cases in which a single column is returned.
pandas/core/frame.py
Outdated
if self.columns.nlevels > 1: | ||
return self._getitem_multilevel(key) | ||
return self._get_item_cache(key) | ||
except (ValueError, TypeError): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so a list-like key?
indexer = convert_to_index_sliceable(self, key) | ||
if indexer is not None: | ||
return self._getitem_slice(indexer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed?
Yes, removed, it was a pretty useless one-liner
is_single_key = isinstance(key, tuple) or not is_list_like(key) | ||
|
||
if is_single_key: | ||
if self.columns.nlevels > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn’t this case handled by _getitem_multilevel (above)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only if columns.is_unique
if self.columns.nlevels > 1: | ||
return self._getitem_multilevel(key) | ||
indexer = self.columns.get_loc(key) | ||
if is_integer(indexer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an argument for _take to accept a scalar integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too sure. _take
is such a fundamental method that there might be good reasons to keep it simple. Anyway, we can discuss this (in some other issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moreover, the problem should disappear when we fix #9519 , that is, when the return type becomes predictable from the index (non-)uniqueness
@jreback added a comment clarifying the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok lgtm. can you add a whatsnew, maybe needs a subsection.
I don't think a subsection is worth putting in the whatsnew - because the change is relatively marginal. But I do think we'll need to clarify list-likes in general in the docs - see #21784 |
thanks @toobaz nice! |
…das-dev#21313) * BUG: fix DataFrame.__getitem__ and .loc with non-list listlikes close pandas-dev#21294 close pandas-dev#21428
git diff upstream/master -u -- "*.py" | flake8 --diff
xref #21309 , but worth fixing separately